Skip to content

crates.io: Use cache + origin request policies for webapp CloudFront#1069

Merged
marcoieni merged 1 commit into
rust-lang:masterfrom
Turbo87:cloudfront-webapp-cache-policy
Jun 4, 2026
Merged

crates.io: Use cache + origin request policies for webapp CloudFront#1069
marcoieni merged 1 commit into
rust-lang:masterfrom
Turbo87:cloudfront-webapp-cache-policy

Conversation

@Turbo87

@Turbo87 Turbo87 commented Jun 3, 2026

Copy link
Copy Markdown
Member

Replace the legacy forwarded_values block with a dedicated cache policy and origin request policy. Cookies, User-Agent, Referer, X-Request-Id, and Authorization are forwarded to the origin without being part of the cache key, so authenticated and per-session requests no longer fragment the cache. Accept stays in the cache key for content negotiation, and Accept-Encoding is normalized via the compression settings. TTLs are unchanged (0/0/1y).


Created with the help of Claude Code, which:

  • analyzed the legacy forwarded_values cache key and identified that cookies, User-Agent, and Authorization were fragmenting the cache,
  • verified Fastly and CloudFront cache-key and Vary header behavior against the vendor docs, and
  • migrated the configuration to a cache policy and origin request policy.

Replace the legacy `forwarded_values` block with a dedicated cache policy
and origin request policy. Cookies, User-Agent, Referer, X-Request-Id, and
Authorization are forwarded to the origin without being part of the cache
key, so authenticated and per-session requests no longer fragment the cache.
Accept stays in the cache key for content negotiation, and Accept-Encoding
is normalized via the compression settings. TTLs are unchanged (0/0/1y).
@Turbo87

Turbo87 commented Jun 4, 2026

Copy link
Copy Markdown
Member Author

@rust-lang/infra any thoughts on this? should we try this out on staging?

@Turbo87 Turbo87 changed the title crates-io: Use cache + origin request policies for webapp CloudFront crates.io: Use cache + origin request policies for webapp CloudFront Jun 4, 2026
@marcoieni marcoieni self-assigned this Jun 4, 2026
@marcoieni

Copy link
Copy Markdown
Member

the changes look good. I applied to staging. Let me know if you want me to apply to prod 👍

@Turbo87

Turbo87 commented Jun 4, 2026

Copy link
Copy Markdown
Member Author

I had Claude verify the cache key behavior against https://cloudfront-app.crates.io/ (old) and https://cloudfront-app.staging.crates.io/ (new) via /_app/immutable/chunks/kNaey6uv.js as a cacheable example path:

  ┌─────────────────┬───────────────┬─────────────┬────────────────────────────────────────────────┐
  │    Attribute    │     Old       │    New      │                Expected by diff                │
  │                 │  (crates.io)  │  (staging)  │                                                │
  ├─────────────────┼───────────────┼─────────────┼────────────────────────────────────────────────┤
  │ Accept          │ IN            │ IN          │ in cache key (HTML vs JSON) ✓                  │
  ├─────────────────┼───────────────┼─────────────┼────────────────────────────────────────────────┤
  │ Accept-Encoding │ IN            │ IN          │ in key via brotli/gzip normalization ✓         │
  ├─────────────────┼───────────────┼─────────────┼────────────────────────────────────────────────┤
  │ Query string    │ IN            │ IN          │ in key (every fresh random ?probe-… was a MISS │
  │                 │               │             │  on both) ✓                                    │
  ├─────────────────┼───────────────┼─────────────┼────────────────────────────────────────────────┤
  │ Referer         │ IN            │ NOT         │ dropped from key → origin request policy ✓     │
  ├─────────────────┼───────────────┼─────────────┼────────────────────────────────────────────────┤
  │ User-Agent      │ IN            │ NOT         │ dropped from key → origin request policy ✓     │
  ├─────────────────┼───────────────┼─────────────┼────────────────────────────────────────────────┤
  │ X-Request-Id    │ IN            │ NOT         │ dropped from key → origin request policy ✓     │
  ├─────────────────┼───────────────┼─────────────┼────────────────────────────────────────────────┤
  │ Authorization   │ IN            │ NOT         │ dropped from key → origin request policy ✓     │
  ├─────────────────┼───────────────┼─────────────┼────────────────────────────────────────────────┤
  │ Cookie          │ IN            │ NOT         │ dropped from key (cookie_behavior=none in      │
  │                 │               │             │ cache policy) ✓                                │
  └─────────────────┴───────────────┴─────────────┴────────────────────────────────────────────────┘
Test Script Code
#!/usr/bin/env python3
"""Probe which request attributes are part of the CloudFront cache key.

Technique: carve out a private cache entry with a unique query-string token
(query string is in the cache key on both configs), warm it with value A until
it returns a HIT, then flip to value B. A MISS on B means the attribute
contributes to the cache key; a HIT means it does not.
"""

import http.client
import random
import time

UA = "crates-cache-test (tobias@bieniek.cloud)"
PATH = "/_app/immutable/chunks/kNaey6uv.js"
DOMAINS = ["cloudfront-app.crates.io", "cloudfront-app.staging.crates.io"]


def get_xcache(domain, qs, headers):
    """Return the X-Cache header value for one request."""
    conn = http.client.HTTPSConnection(domain, timeout=20)
    h = {"User-Agent": UA}
    h.update(headers)
    conn.request("GET", f"{PATH}?{qs}", headers=h)
    resp = conn.getresponse()
    resp.read()
    xc = resp.getheader("X-Cache", "<none>")
    conn.close()
    return xc


def classify(b):
    if b.startswith("Miss"):
        return "IN  cache key"
    if b.startswith("Hit") or b.startswith("RefreshHit"):
        return "NOT in cache key"
    return f"?? ({b})"


def probe(name, a_headers, b_headers):
    print(f"== {name} ==")
    for domain in DOMAINS:
        qs = f"probe-{random.randint(0, 10**12)}"
        a1 = get_xcache(domain, qs, a_headers)
        time.sleep(1)
        a2 = get_xcache(domain, qs, a_headers)
        b = get_xcache(domain, qs, b_headers)
        print(f"  {domain:<34} A=[{a1} | {a2}]  B=[{b}]  -> {classify(b)}")
    print()


probe("Accept", {"Accept": "application/json"}, {"Accept": "text/html"})
probe("Referer", {"Referer": "https://a.example/"}, {"Referer": "https://b.example/"})
# both User-Agent values must be valid custom UAs, else crates.io returns 403
probe(
    "User-Agent",
    {"User-Agent": "crates-cache-test-A (tobias@bieniek.cloud)"},
    {"User-Agent": "crates-cache-test-B (tobias@bieniek.cloud)"},
)
probe("X-Request-Id", {"X-Request-Id": "aaaaaaaa"}, {"X-Request-Id": "bbbbbbbb"})
probe(
    "Authorization",
    {"Authorization": "Bearer tokenA"},
    {"Authorization": "Bearer tokenB"},
)
probe("Cookie", {"Cookie": "probe=a"}, {"Cookie": "probe=b"})
probe("Accept-Encoding", {"Accept-Encoding": "gzip"}, {"Accept-Encoding": "br"})

in other words: this looks like it works as intended and I guess we can put this onto prod :)

@marcoieni marcoieni merged commit 2af0b1b into rust-lang:master Jun 4, 2026
4 checks passed
@marcoieni

Copy link
Copy Markdown
Member

applied to prod 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants